I 



DOCeiBlIT BSSOMB 



BO 181 3«3 

AOTHOB 

ZNSTXT0XI3N 

SPOHS AGENCY 
PUB OiTE 
NOTE 

AV&ILILBLE FBOH 



C6 014 055 



EORS PRICE 
DESCfilPTOES 



Jung, Steven R. 

Trying out Activities and Monitoring Early 
Iipleientatlon Efforts. Module 10. 
Aierican Institutes for Research in the- Behavioral 
Sciences* Palo Alto, Calif- 
Office of Education (DHEW), Washington- D,C, 
Jul 78 

7 Bp*: Por related docunents see CG 01 M 014 9*057, ED 
12U 823 and ED 133 51 1. Reprint- 
National consortiutt Project, American Institutes for 
Research, P.O. Box 1113, Palo Alto, CA 94 302 
(S3.20) 

MF01 Plus Postaae. PC Not Available fron EDES. 
♦Career Development: counselors; *Evaluation 
criteria: Evaluation Methods? Guidance Personnel: 
Inservice Prograns: *Occupational Guidance: 
♦ Perforaance factors: Profelssional Developaent: 
Prograa Design: *PrograB Developsent: *PrograBi 
Effectiveness; Secondary Education 

ABSTRACT 

This staff developaent lodule is part of one of three 
groups of career guidance nodules developed, field-tested, and 
revised by a six-state consortius coordinated by the Anerican 
institutes for Research, This lodule is the tenth in a series on 
developing a cosprehensive career aui dance progran at the high school 
level, designed to aid guidance personnel responsible for developing 
student-focused prograas. The goal of this aodule is to convince 
users of the value of the empirical approach <,to , pro graa developaent, 
and to iapart those skills required to conduct and neasure the 
effects of activity tryouts and early iapleaentation efforts- Tfae 
Bodule foraat consists of an overview, goals, objectives, outline, 
tiae schedule, glossary, readings, skill developaent activities, and 
bibliography, A Coordii.ator« s Guide is also included with detailed 
instructions for presentinq the aodule in a workshop setting as well 
as the facilitator's roles and functions, and the criteria used in 
assessing the participants' achieveaent of aodule objectives, 
(Author/HLM) 



* Reproductions supplied by EDRS are the best that can be aade * 

♦ from the orioinal docuaent. * 



ERIC 



BARD COPY m r .".BiE 



I 

MODULE 10 



TRYING OUT ACTIVITIES AND MONITORING 
EARLY IMPLEMENTATION EFFORTS 



ryteven M. 



us DE HtALTH. 
CDUCATION 4 WELFARE 
NATIONAL -NSTlTUTE OF 
EDUCATION 

^ ^n^^l.V^•N• MAS BEEN REPRO- 
l^' f t KA( U V AS «tCErvED PRQM 
'•U i»t ws.M^ 0»* O»C.AN«/ATl0N ORIGlN- 
.^' S . • PONTSO^ V A OR OPINIONS 
^ ' A ' f CH'' NOT NE ( F SSARiL Y P F PRE- 
S» ^^ ^ -I A, NAT.qna; iNSTiTl/^€OF 
A» fM")sit«ON 0« POL T V 



Jung 

"PERMISSION TO REPRODUCE THIS 
MATERIAL IN MICROFICHE ONLY 
HAS BEEN GRANTED BY 

A.l. P. 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



Developed at the American Institutes for Research, under support by the 
United States Office of Education, Department of Health, Education, and 
Welfare under Part C of the Vocational Education Act of 1963. 

August 1975 

Reprinted with minor revisions, July 1978 2 




ISBN 0-89785-574-4 

Illustrations in this module are by Jurgen Wolff. Editorial work is 
by Charles Dayton and Phyllis DuBois. 

This module builds on cooperative staff development activities of: 
Mesa Public Schools; Mesa Arizona, 

Fremont Union High School District; Sunnyvale, California 

Santa Clara County Superintendent of Schools Office; San Jose, California, and 
American Institutes for Research; Palo Alto, California 



Support for these efforts was received through the United States Office 
of Education, Department of Health, Education, and Welfarp under Part C 
of the Vocational Education Act of 1963 and Title III of .he Elementary 
and Secondary Education Act of 1965. Points of view or opinions stated 
do not necessarily represent U.S.O.E. position or policy. 



ERIC 



3 



TAHU: OF comwT:! 



INTRODUCTION 

Module Goal and Outcomes _ 

Module Outline ,_ 

Model and Module 10 3 

Glossary ^ 

TEXT 

Setting the Stage 5 

When and when not to pilot test 7 

Discussion on the concept of "subset optimization" 10 

Planning the General Strategy of Pilot Testing — 12 

Before the pilot test 12 

Preparing to pilot test •- - • 13 

Discussion on a general strategy of pilot testing 16 

Activity 1 - Analyzing a program's internal logic 20 

Estimating Costs 2i 

Activity 2 - Estimating Costs 23 

Monitoring During Early Implementation Efforts 26 

Measurement Techniques 27 

The four basic techniques — - 39 

Considerations of adequacy 32 

Activity 3 - "Really understanding" measurement and 

measurement techniques — 40 

Results of Preliminary Tryouts and Early Implementation — 44 
Summary 

Discussion Questions 4g 

POSTASSESSMENT 47 

APPLICATION 4Q 

APPENDIX 

Optional G.^oup Simulation Description 51 

References 53 

COORDINATOR'S GUIDE 



INTRODUCTION 



MODULE GOAL AND OUTCOMES 

r 

To convinqe you of the value of the empirical approach to program 
development and to impart certain skills required to conduct and measure 
the effects of activity tryouts and early implementation efforts. 

(Vt '« iu I o Ou tcomcs * 

Following completion of the learning activities of this module, 
you should be able to perform the following: 

1. State the primary purpose of carrying out preliminary activity 
tryouts and describe at least two situations in which such 
pilot testing is useful and two in which it is a waste of time. 

2. Verify the internal logic of a planned career guidance activity, 
given a written description of such an activity covering its 
goals, student performance objectives, and process objectives. 

3. Develop a relatively accurate estimate of the cost of imple- 
menting a career guidance activity. 

4. Develop measures of attainment of the process objectives of 
a career guidance activity, given a written description of 
the activity and its process objective:;. 

5. Develop relatively objective, reliable, and valid measures of 
the outcomes of a career guidance activity. 

6. Develop a plan for trying out activities and monitoring early 
implementation efforts at your school. 



The criteria for attainment of the outcomes are available in the 
Coordinator's Guide for Module 10. 



1 



MMVLE OUTLINE 



Approxlmatk. 
■I'ir.e 



Aotivi t]i 



Outcomes 



15 minutes 



3 hours 



\ hour 



1 hour 



Introduction 

The Coordinator will explain the basic 
purpose of this module. Several additional 
textbooks are referenced herein and may be 
used for extended learning. The basic 
outcomes of the module, however, can be 
attained without these texts. 
Text 

Presentation of important information and 
opportunities to practice skills related 
to activity tryouts and early implementation 
moni toring. 
Postassessment 

Assessment of your acquired knowledge and 
skil Is. 
Appl i cat ion 

Planning for your own tryouts and monitoring 
efforts. 



1-7 



1-7 



8 



7 




MODEL AND MODULE 10 



The model on the next page shows the relationship among the 
modules. Trying Out Aat-ivities and Monitoring Early Implementation 
/•VP />fs comes at the end of Phase 3 and is Module 10. 

This module deals with the tasks important in the early stages 
of actual implementation. Thus, it butlds on the process objectives 
dealt with in Module 8, while foreshadowing the suimative evaluation 
to come later. It is the bridge between looking forward and looking 
back, between preparing and assessing results. 



4 



3 

8 



A \f<)dt»l for I)<n-(^lopin|^ 
Comprt^heiisivo ('^<><^i/^JuiduiUM^ Pfo^nims 



/ 




4 

ERIC 



I 

- A property of a measurement which refers to the degree to " 
which the technique is inexpensive, easy to administer, and ^asy to score. 

Evaluation d'sigyi -' f\n arrangement of persons, an activity* and mtasures 
of effect so tha,t inferences can be made about the probable effects of 
the activity. 

h'or^ative evaluation - A process of collecting and using information 
during program development in oMer to improve the functioning of 
the program. 

Monitoicn.ng - A process of conducting data collection activities during 
early program implementation tb permito'mmediate revisions in the 
program. \^ - , y 

Mon-reaectvity - A property of a m6asur?'ment technique which Refers Uo 
the absence of undue influence upon the person to whom it is 'applied. 

A'Jeativity- A property of a measuremerit technique which refers to the" 
degree to which the technique produces the Same score regardless of who ' 
applies it. 

Pi lot tee ting - A process of conducting. 1 iifiited tryouts of specific 
activities that have been tentatively adopted as a result of planning 
and design procedures. . 

Reliability - A property of a measurement technique which refers to the 
consistency of its scores and their relative freedom from chance variation 
over time. 

:Jubset optimisation - Dr. Richard Schutz's phrase for ensuring that 
subcomponents of a program are performing optimally. 

^h^rriativc evaluation - A process of collecting information to facilitate 
judgments about the overall worth of a program; especially appropriate 
to later implementation stages. 

ya:i.::ty - A property of a measurement technique which refers to the 
degree to which the technique actually measures what it is supposed to 
measure, intuitively and by empirical demonstration, thus yielding a 
relatively "true" score. ^ 



ERIC 



^0 



SEITINO THE STAGE 



r 



.4. !^ef initios 8 



I. 



Fovyriative ^oaLuatioyi 
Pilot tes tiyu! 



rt'^k^n' and 'jhen not to pilot test 
Foui'\Y^ales for rating activities 



As you have no doubt learned^\om earlier modules, two 
of our basic themes are continuously trying out activities 
and .making modifications based on obtained results. These 
themes reflect a basic commitment to the virtues of con- 
tinual "formative" evaluation, a term coined by Professor 
Michael Scriven (1967,/ to refer to a process bf collecting 
and using information to improve the functioning of educa- 
tional programs. The concept of formative evaluation is 
the subject of an article by Garth Sorenson that you may 
wish to read. It is included in the Appendix X)f this 
module. 

In this module, we are interested in application of 
formative evaluation procedures to a very specific £tage 
in the program planning and development process: the stage 
which follovb the determination of goals, objectives, and 
intervention strategias and precedes any summative evalu- 
ation of these strategies. This stage includes pilot test- 
ing and monitoring early implementation efforts. 

Pilot tests are limited tryouts of specific activities 
which have been tentatively adopted as a result of the pre- 
viously discussed planning and design procedures. These 
tryouts should usua}]y precede larger scale (i.e., school- 
wide or systemwide) program implementation efforts. 
However, because of time or money limitations, it may be 
necessary to proceed to actual program implementation ef- 
forts without pilot tests of component activities. 



Formulative 
Eva luation 
Defined 



Pilot Testinp 
Defined 



ERIC 



11 



i 



During early implementation' efforts, in which all com- 
ponents of the career guidance program are being tried out 
I'l concert for the first timertt^is important to engage in 
data-collection activities designed to permit immediate re- 
visions. Such activities have been given the term "ironitor- 
ing" in this module. Summative evaluation, designed to fa- 
cilitate judgments about the overall worth of the program, 
should not begin until the program components have had an 
opportunity for revision based on monitoring results. If 
summative evaluation is attempted prematurely, because of • 
program accountabi 1 itv iemands for administrators, for 
exainpUx--U^is possible that correctable flaws in design or 
implementation will cause premature termination of the 
program. 

Wh^n and When Not to Pilot Test 

Pilot testing does require time and money which might 
well be devoted to other pursuits. Moreover, pilot tests 
rarely result in dramatic improvements for activities which 
are already well conceived and planned; smaller, incre- 
mental improvements can be expected. 

Thus, pilot testing of planned activities is not al- 
ways desirable. If an activity is not intended to be rep- 
:'j,d-U, there is little use for preliminary tryout and 
formative evaluation. In other words, one-shot events, or 
activities which depend for their implementation on unique 
situation's, spontaneity, etc., probably should not be pilot 
tested. A good example of this is the "Career Niyht" type 
of activity. 

Similarly, activities which are already in their final 
form and for which there is no ;nj;ent. of revision, should 
not undergo pilot testing. These would normally be tradi- 
tional activities whose substance is dictated by precedent 
rather than by intent to produce prespecified student out- 
comes, e.g., an initial orientation assembly. 



Monitoring Defined 



Don't pilot test 

if activity is not 
revlioahle 



or not revisable 



1'^ 




" Activities which should receive highest priority in 
conducting pilot tests are those which are most aruaial to 
the success of the overall career guidance program and 
about which there. is the most uncertainty in terms of de- 
sired outcome attainment. 

In determining whether or not to conduct empirical 
pilot tests, it might be useful to rate the activities you 
have selected for implementation on four simple scales such 
as the following: 



Do pilot test 
activities which 
are most crucial 
and wi.ll produce 
uncertain effects 



V 



Four scales for 
rating activities 



8 

l3 



REPLICABILITY 



1 



There is no intent. 

to repeat this activity. 



REVISABILITY 



1 

The structure of this 
activity is inviolate. 



IMPORTANCE 



1 



The success of this 
activity is not crucial 
to the success of the 
career guidance program. 



UNCERTAINTY 



If this activity 
proves successful , 
it will be used 
repeatedly in the 
future. 



There is a strong 
desire to modify and 
improve this 
activity. 



The success of this 
activity is abso- 
lutely essential to 
the success of the 
career guidance 
program. 



1 II 
There is no question • There is considerable 

about the potential doubt about the poten- 

effects of this activity. tial effects of this 

activity. 

Sum the ratings and rank order the sums. Activities which 
score highest should receive whatever time and financial 
resources are available for oilot testing. 



ERIC 



Or. Richard Schutz coined the phrase "subset optimization" in a 
1970 address to the American Educational Research Association. 



The final point involves the distinction between comparative 
and cumulative experimentation. The experimental tradition 
in the behavioral sciences is comparative. One compares 
effects of different phenomena introduced concurrently or 
simultaneously. An equally venerable experimental tradition 
involves comparisons over time which cumulate in more optimal 
performance. This tradition has often been rejected in 
education because of ifs industrial connotations. Cumulative 
optimization methodology can, hovstever, be applied to educational 
endeavors without considering people as machines, just as . 
comparative experiment methodology may be applied without 
considering people as fertilizer. 

The point revolves around the concept of subset optimization. 
It is both reasonable and necessary to use subset criteria. 
For example, although specified changes in pupil behcivior 
represent the ultimate criterion which instructional develop- 
ment is attempting to optimize, it is very unwise to use this 
as the sole feedback basis. For example, our staff have, at 
tiiiies, been very disappointed when specific procedures such" 
as teacher training, audiovisual segments, etc., have not 
improved pupil performance. One could throw out the prototype 
and look in a different area. But on closer examination and 
analysis, we have each time determined that the intended 
function was itself not being performed by the instructional 
components. That is, the teachers learned nothing or the 
wrong things from the training, the audiovisual segments were 
being "misused", etc. With subcomponents performing optimally, 
one has a much better likelihood of accomplishing larger 
functions optimally. Optimization is unlikely achievable with 
unreliable subcomponents. This sounds obvious, but it is almost 
universally overlooked in education.' 



^Richard Schutz, "Programmatic Instructional Development", Paper delivered 
at the 55th annual meeting of the American Educational Research Association, 
Minneapolis, Minnesota, March 5, 1970. 



10 



i5 



Do you agree with the last sentence above? Can you cite exam- 
ples from your own experience where (1) the improper functioning of 
some component of a program has detracted from the effects of the 
program as a whole; and/or (2) the improvement of some component of 
a program has improved overall program functioning? If you can cite 
examples, please lis,t them on the blackboard and discuss the actual 
circumstances with others in the group. 



u; 

11 



PLANNING THE GENERAL STRATEGY OF PILOT TESTING 



4. Rtwiewing intemaL logic 

F. Ih'eparing pilot teat 

1. Selecting a sample 
V,. Designing evaluation 



Be fore the Pilot Test 

Prior to pilot testing, it is almost always useful to 
perform a brief review of the internal logic of planned ac- 
tivities. Assuming that program goals, student performance 
objectives, and process objectives have been developed in 
accordance with techniques suggested in preceding modules, 
this review can be relatively simple and straightforward. 
It should include the following questions, roughly in this 
sequence: 

1. Do the planned activities relate to the intended 
student performance objectives? Is there a reason- 
able chance that the activity » if carried out as 
indicated, will produce the desired outcomes? 

2. Do the proposed student performance objectives 
relate to the program's stated goal or goals? 

3. Is there reason to believe that most members of the 
target audience can already perform the student 
performance objectives, making the planned activi- 
ties unnecessary? 

4. Are the activities described specifically enough to 
be observed? Can their attainment be documented? 

5. Is there an indicated time schedule or sequence for 
multiple activities? 

6. Are the planned activities and sequence practical 
with regard to the constraints likely to exist in 
this situation? 



Review Internal 
Logic 



ERIC 



12 

17 



7, Are the planned activities and sequence appropriate 
to the level of the target audience? 

8. Are there likely to be side «.'//.'(?ta which developers 
have probably not anticipated? 

From the above, it may be seen that initial pilot 
testing precedes actual empirical tryouts. Negative answers 
to the above questions should be pursued with designers 
until a satisfactory positive response can be identified. 

Preparing to Pilot Test 

Normally, early tryouts of activities should be low 
cost, involving few students and abbreviated time schedules. 
Although these limitations are naturally relative to the 
scale of the activity in question, tryouts requiring more 
than ten classes and/or five days are extremely rare. The 
jt'nK:t'al stratcay df pilot "testing is to implemept the planned 

activity under well -control led circumstances, using members 
of the target population, collecting (1) objective^ infor- 
mation on degree of attainment of process and student per- 
formance objectives, and (2) any other objective or subjec- 
tive information which is likely to be useful in indicating 
changes that can potentially be made in the activity. Care 
must be taken to avoid the tendency to overcollect ciata at 
this stage, since the range of available modifications is 
likely to be limited, and data-processing and analysis pro- 
cedures have to be rapid so that immediate feedback can be 
obtained. 



By objective in this context, we mean a property, of outcome 
measures, to be explained later, which refers to the degree 
to which a test or scale produces the same score rega dless 
of who applies it. This is not to be confused with objea- 
tive in the context of a statement of desired outcomes. 



A General Strategy 
of Pilot Testing 



"Less is usually 

„ n 



more 



13 

18 




Select a sample for the tr^out . In carrying out a 
pilot test of the selected activities , you are interested 
in infem^iyu from the results of the tryout the probable 
results of actual large-scale implementation with your 
target population. Therefore, it is essential to select 
persons for the tryout who are at least broadly represen- 
tative of the target population. In practice, this often 
means identifying persons who may be predicted, because of 
past performance, to do exceptionally well, poorly, and 
average on the activity. Strict Random sampling procedures 
in which every person in the target population has an equal 
chance of being selected for the tryout, are rarely used 
because of the low probability that small random samples 
will contain representatives of the extremes of the charac- 
teristics (ability, experience,' motivation , etc.) which are 
presumed to cause above- or below-average performance. 
When reporting on the results of a tryout, it is important 
to. desc2*ibo the characteristics (age, sex, prior experience 
and achievement levels, grade, etc.) which were used in 
identifying the tryout sample. 

Evaluation design. The word dmf.m in this context 



Strive for Broad 
Representation of 
Target PopulattoK 



Evaluation !k 
Def ineu 



ERIC 



14 



19 



refers to an arrangement of pepsons, an aativity, and 
meaeures of effects SO that inferences can be mde about 
the probable effects" of the activity on a larger group of 
similar persons. Design problems can quickly become very 
complex, especially In educational experiments which are 
conducted to demonstrate the relative effectiveness of 
competitive experiments. Comparative experimental evalua- 
tion designs, however, have no place at this particular 
stage of the program planning and development process. Two 
so-. ailed "pre-experimental " evaluation designs (Campbell 
and Stanley, 1966) are normally sufficient for the inferen- 
tial needs of this stage. These are the "one-group post- 
test only" design and the "qne-group pre-test-post-test" 
design. These may be diagramned as follows, where X stands 
for implementation of the activity and 0 stands for outcome 
measurement. 

One-group post-test only design X 0 
One-grdup pre-test-post-test design 0^ X O2 

The major tradeoff in selecting the latter design over the 
former is the value of having pre- implementation information 
versus the danger of produaing effects solely through admin- 
istration of the pre-test. The latter dangers can be re- 
duced by using nonreaative measures, which are discussed 
in the next sub-section. Thus, the one-group pre-test-post- 
test design is recommended for most tryout applications. 



•3 



The "Best" Design 



;;. v; 7, w .v; ,.i jkiVfml .iiwrmri of pilot tfstdw; 



Please-read the following fnbles. 

(1) In May, a team of developers at Research Associates, Inc. 
was given a contract to prepare an innovative career education 
curriculum for seventh graders, on the understanding that the 
developers would be able to deliver the following March. The 
team members got to work at once, but since summer was over by 
the time they finished the materials and developed the accom- 
panying book and A-V support materials, the team had everything 
produced and began a full field test in August without bothering 
to try the components out first. When results were finally in, 
the field test unfortunately revealed a number of serious prob- 
lems. The revisions required were so extensive that the materials 
could not be completed by the deadline, and it was therefore the 
following autumn before the program could actually be implemented 
in the schools. 

It is not certain, but it is at least possible, that a tryout 
would have revealed some of the problems before so much work had 
been done. 



Moral: Shortcuts can produce long delays. 



(2) In response to a directive from the school board to- emphasize 
career education., a group of teachers at Urban High School got 
together to develop a career handbook. They carefully specified 
their objectives, chose instructional techniques, and tested their 
book with typical students when they were done.' The students liked 
it, and testing showed that they were able to master the material. 

However, when those same students got out of high school and began 
looking for jobs, they discovered that the employment picture was 
vastly different from that described by the handbook. When they 
revisited the high school and told the teachers who had developed 
it, the developers were extremely shocked and puzzled. What 
happened? 

Admittedly employment is an area in which changes occur almost daily; 
however, the developers failed to consult experts in the field while 
they were designing the handbook, and they did not have it reviewed 
by people conversant with the real situation after they had prepared it 



16 



Moral: Unless you are the One Great Expert on the inaterial you 
are presenting, seek an expert opinion on the appropriateness 
of your activity's goals and objectives before pilot testing. 
It is a truism that if the goals of an activity aren't worth 
achieving, then it is uninteresting how well they are achieved. 



(3) A Career Education textbook writer recently explained that 
he did indeed try out his book before publication. He had 
several teachers use it in their classrooms for a semester and 
then report their reactions. When asked what aspects pf the 
book he was interested in evaluating, he replied, "None in 
particular.' f was interested in reactions in general." 

While this approach is better than nothing, that is all it is. 
Ideally, a tryout should be predicated on very clear notions of 
what aspects of the activity are being examined. Only in this 
way is it possible to insure that important elements, are indeed 
evaluated. 



Moral: If all you are interested in is "reactions in general," 
don't be surprised if your p-^lot testing yields nothing specific. 




17 

2Z 



a\^a?eTed'uc1^^^^^ Associates, Inc. had developed 

a career education leader train ng program with a short film 
Illustrating several types of leade?/g?oup Interaction. iL 

of the if Pm^ rTntf °" pre-test, got about two-thirds 

tt%,^'cTt^o"s'^^^; s^fe" th.'7^^^ stin discussing 

participant remark ''You know Ih.? f^^^^'" ^^PP^ned to hear a 
an thoSght thTnarra[?on r^a'So^e'-'" ' ''''^ 

on^heTwou?d^L'v^'in^rH '"^''l '''' ^^^^^"9 that fro. then 
on lef?ne?^^acJion. n h ' ^"^trument to collect information 
learner reactions in their assessement. 




boL^riniend.d'J.Tin'"!:'"^."' tryout of a self-instructional 
tesll^' S^n^^Sl? ;Sa^11 rvtali^^Tha'St^f i^ie'^o'^f^^r'"" 

aside fnr fi. '° '"'^ ^"d " a room set 

theleache^s soL7a°'?;. "f'V*^' found that 

talkina to «rh nJhf °f their time drinking coffee and 
seem^rt^ h! . ^'^^'"^ coments on the materials 

relTtion^ be a consensus rather than a collection of ind vidual 
wr te ?he boSJ e. ^''"'^'''•^^"^y -"enti ,ned that he had helped 
perceptibly ' ' "'""'^^ °* criticism decreased 



r 

iTf'! J""^.'" """""l the circumstances of the pilot test 
especially in striiiction time. Use eonmon ser, e? 



aame^SLSL'^"™"*^ '^°''Vm was working on a children's 
n^^e'a? ] s^'of^aach""^^^?^ °''r''' "ote'd'thl'intr c't s" 

mi^ Hv-f:r"°"-^ 

the game. She spent the month analyzing and introsDectinV 
about the observation data and produced a 79-paqe pilot test 

day after a major competitor unveiled a similar game in nation- 
wide Saturday morning TV advertisements. 



tiZ'or resoSrc'es''^''^' '''''''''' ^^^^ toe much 



ERIC 



18 

23 



These fables., are presented to illustrate some of the important 
aspects of a ge^wval strategy of pilot testing. Try to derive such 
a general strategy from them and write it down in the space below; 
then compare answers with members of your group. 

Discuss these questions with the members of your group: 
What. types of activities seem most appropriate for pilot testing 
Can one-to-one counseling be "pilot tested?" "Group counseling?" 
Why or why not? 



19 



Please read the description below which contains a brief written 

two^.'f^NdP^nVni'Jr""'^ ''T' T"^'^^'^^ ^'^^^^'^ program, caverinq one qodi, 
two student performance objectives, and two process objectives. Then 
construct a written analysis of the internal logic of this program, 
listing factors discussed in the text and presenting your judgment of 
,the program s probable effectiveness with regard to each factor. 

The guidance staff of Fliver Junior High School is considering 
the implementation of a new career guidance pr^ram for ninth graders. 

The primary goal of the program is to increase stude?its' awareness 
of post-high school' vocational training programs which are available 
in Fliverville and to , increase their knowledge of the minimum entry 
requirements for each training program. 

Two student performance objectives of the program are as follows: 

1. Following the program, "each full-term participant will be 
able to describe the post-high- school vocational training 
programs available in Fliverville. 

2. Following the program, each full-term participant will be 
able to describe the minimum entry requirements for all of 
the post-high school vocational training programs in Fliver- 
ville. 

To carry out a program designed to help each participating student 
meet these objectives,^ the. guidance staff intend to. implement the 
following activities: ' * • . 

1. A sppcial study carrel will be opened in the guidance office 
containing copies of Lovejoy's College and" the Ooc^pa- 
i-.oyiai i^utlook ikindl'uok. The availability of the carrel and 
its contents will be announced to all ninth graders during 
the first assembly of the school year. 

2. Prior to the Christmas vacation, all ninth graders will be 
given a mimeographed handbook on vocational training programs 
in Fliverville. The^handboo^ will contain a two-page section 
0.1 each of jco^-k avail able .programs listing their goals, cur- 
rent enrollment, minumulf^nlry requirements, length, and any 
available information on the success currently being encountered 
by .graduates of the program. A written certification will be 
required from the parent ur guardian of each ninth grader to the 
effect that the handbook has ^en brought home and noted by the 
parent or guardian; students who have not returned a certifi- 
cation by January 31 will be called to the guidance office for 

a special meeting to go over the handbook. 

20 4?5 



4 



ESTBiATING COSTS 



A, Five basia aoat aategoriee 

B. Calautation 'of rates 



One of the most useful purposes a pilot test can serve 
is to provide more accurate estimates of the likely full- 
scale implementation costs of an activity. As is pointed 
out in more detail in the summative evaluation module 
(Module 11), program outcome data are almost always con- 
sidered in- a cost-effectiveness aontext in making decisions 
about overall program worth. It is not unrealistic to ex- 
pect preliminary tryouts to point out activities whose 
cost-to-effectiveness ratio is not likely to be judged 
favorably by decision makers. Steps can then be taken to 
either increase (or provide more convincing measures of) 
effectiveness or reduce costs if large-scale implementation 
is still intended. 

Cost estimation proceeds in several steps, each of 
which can be 'made more' simple if school or district account- 
ing procedures allow for the central storage and retrieval 
of activity costs. The suggested steps are as follows. 

First, establish some general expense categories for 
which costs can reasonably be recorded, such as the follow- 
ing: 

1. Personnel costs, including salaries and fringe 
benefits for administrators, counselors, teachers, 
specialists, consultants, etc. who are involved 

in directly implementing the activities 

2. Instructional equipment and materials used in 
implementing the activity 

3. Other program expenses (e.g., specMl postage, 
data-processing costs, etc.) 

4. Building costs (a proportion of total building use 
and maintenance costs) 

5. Fixed administrative charges (a proportion of 
general administrative costs) 



Importanae of 
Coat Estimates 



Five Basic Cost 
Categories 



ERIC 



21 



9 



6' 



Second* record all costs expended by the activity 
during the span of its tryout. Third, sum category costs 
and try to calculate a daily or weekly unit rate for each 
category or subcategory (e.g., teacher days, counselor 
days, etc.). These rates can then be used for roughly 
projecting costs of full-scale implementation. 



Rate Calculation 



0 



7 



ERIC 



22 



ACTIVrn 2 - ESTIMATING COSTS 

Your school last year decided to use one classroom as a career 
center to house career related materials, hold classes, show films, 
have discussion groups, and so on. You have been asked to determine 
the economic feasibility of the effort; i.e., specifically, to deter- 
mine the expense for the first year (10 months) of operation. You have 
the following information to go on and can use your judgment to make 
estimates where necessary. The worksheet on the following page should 
prove helpful in making the estimates. 

Personnel costs for the effort included: 

• One paraprofessional at $5.00/hour, six hours a day, five days 
a week, for a 40-week school year, to serve as a librarian/ 
resource person. 

• One day per month of a school principal's time, at $20,000/year, 
to overview and administer the center. 

• One day per month of a consultant's time, at $25,000/year, to plan 
for the materials to be used and help evaluate the center's 
effectiveness. 

• An hour per week of 10 teachers' time, at an average salary of 
$12,000/year, taken from other instruction, to devote to career 
development for their classes. 

• Five hours per week of a counselor's time, at an average salary 
of $12, coo/year, to provide instruction and lead discussions. 

• Volunteered time from community residents, school graduates, and 
students to help with particular presentations, talks, and locally 
relevant programs. 

• One month of a secretary's time, at $9,000/year. 

Equipment and materials costs included: 

• One projector, at $600, 

• One filmstrip projector, at $200. 



23 

S8 



Activity Cost-Estimate 
Worksheet 

Cost Category No. of Units Rate/Unit Estimated Cost 

1. Personnel 



2. Equipment/Supplies 



3. Other Program Expenses 



4. Building Costs 



5. Fixed Costs 



5,9 

.24 



• One screen, at $50. 

• Two portable cassette tape decks, at $100 each. 

•A supply of career related games and assorted materials, at $200. 

• Books and printed materials, at $1,000. • 

•General supplies (paper, pencils, etc.), at $200 for the year. 

The cost of the above seven items may be considered to be spread across 
ten years, so that only 1/10 of the total needs to be considered part of 
the first year's expenses 

Other Program Expenses: 

• Cost of mailing a letter to all district parents to let them know 
about tl?e center. There are approximately 2,000 such parents. 

• Cost of using a two-page hand-scored feedback form to collect 
users' reactions to the center and suggestions for improvement. 
There were approximately 1,000 users the first year. 

• Cost of writing a report at the end of the year on the center's 
effectiveness. Paper and printing costs only need to be figured 
here. The report was approximately 20 pages long, and 100 copies 
were printed. 

Tor all of the above, figure about $.03 per printed or xeroxed page. 
Bui Iding Costs : 

•A general rule of $1,0C0 per classroom per year is the means used 
to estimate building costs. 

Fixed Administrative Charges: 

• This is figured as a 5% additional charge added to the total of 
all of the above categories. 




MONITORING DUHINO EARLY IMPLEMENTATION EFFORTS 



An adequate program monitoring system should continu- 
ously provide program directors and their supervisors with 
data on (1) the degree to which planned activities are 
being implemented as intended, {2) the degree to which de- 
sired outcomes are occurring as intended, and (3) any un- 
planned outcomes (i.e., positive and negative side effects) 

An essential element in planning a monitoring system 
is the "feedback" mechanism, by which process, outcome, and 
side-effect data are transmitted immediately to program 
staff who have the authority to make modifications- in the 
program. As in the case with pilot tests, this means im- 
mediate processing of results in relatively simple and un- 
complicted formats. c' 

In order to facilitate this feedback, it is necessary 
to plan in advance for the administration of process and 
outcome measures, selected or developed according to speci- 
fications which are described in the next section of this 
module. 

Because the concepts of early program monitoring so 
closely match those of pilot testing activities, differing 
mainly in the degree to which activities are separated in 
time, no additional discussion will be presented in the 
text at this time. 



Feedl^aok 



Simitca'ity to 
Pilot Testing 



26 



MEASUREMENT TECHNIQUES 



.4. Foia^ baoif.* nmsiircrnaKt tcahniqut:. 
^ - . rcY*fovmancr. tci^ i c' 

B. Conoidt. mtduHi'' of .idi'iiuacrj 

%^ ■ . / . • • • ' ^ * 

>>. I'ui'-Jit;/ 



In conducting activity tryouts and monitoring early 
program implementation, it is essential to collect data 
which show the degree to vjhich both process and student 
performance objectives have been achieved,' as well as data 
on side effects. Measurement techniques of choice will " 
differ according to the type and specificity of the outcome 
to which they are applied. For example, process objectives 
are usually stated with such directness and precision that 
measurement of their attainment" may involve the constructio 
of a simple L'hc.-'ki i\it of "Yes, it was done," "No, at wasn't"' 
items. An example of some process objectives of this sort 
and a checklist to measure their attainment are included as 
Table 1. Similarly, a student performance objective which 
calls for "an increase of x points on Test Y" involves the 
fairly straightforward administration of Test Y. Attainmen 
of student performance objectives which call for changes in 
observable student behaviors may often be measured simply b 
rearranging the objective into one or more test items which 
presumably will then measure attainment of the objective. 
All measurement techniques, however, should be constructed 
to meet basic adequacy criteria. These criteria will be 
discussed shortly. 



Measuring Attainment 
of Proaess Objectives: 
a Simple Checklist 



7 
i ' 



32 



TABLE 1 

Coiranunity Involvement Component Process Objectives 



1. 

'2. 

3. 
4. 

5. 



6. 
7. 



By May 15 the Community Counci 1, composed of representatives 
from local organizations, school faculty, and administrative 
members, will be established. 

By July 1 the target groups to participate in the program will 
be selected. 

By July 1 the project schedule will be approved. 

After federal acceptance of the project for funding, members 
of the Community Council will be involved in interpretation 
of the project to the community. 

After federal acceptance of the project for funding, the Com- 
munity Council will supply information from interested citi- 
zens to the director regarding community reaction to the pro- 
ject. 

By August 1 the Community Council will approve the district's 
selection of project staff. 

During the course of the project, the Community Council will 
be called upon by the Project Director to assist in acquiring 
volunteers. 



33 



28 



ERIC 



Coimnunity Involvement Component, t'roco';^: Ob .ire lives Checklist 



By May 15 the Coniinuni ty ("ounr i I .cdiroostHi of foprosentati ves ' 
from local organizations, school faculty, .jid administrative 
members .was established. 

no "~ . , 

By July 1 the target grouo':. (.^ (;.«>M i c i pa t.o in tne progr^am 
were selected. 

vt}s X ■ 

no 



By July 1 tne pr-oject schcduh^ was apiiroved. 

yes 
no 



After federal acceptance of thi> (n-oj/ecl tor tundiny, members 
of the Community Council were involved in interpretation of 
the project to the community. 

yes X 
no 



After federal acceptance of to'^ p»-o,iect. for funding, tlie 
Community Council supplied information to the director from 
interested citizens V^garding their reaction to the project. 

yes ___X 

no " 

By August 1 the Coiiiiiiunity Council ^leridod upon the district's 
selection ot pro.ject staf t . 

yi'S 
no 

(liiis is to bo (iccoMij^ 1 i\h(\.l by September 30) 

During the course of tfie pro jiu t to (nis time, ttie Community 
Council has been called upon to assist, in acciuiring volunteers 
when the need arises. 

yes X 

no 



3^ 



Certain widely used measurement techniques are the 
basic tools of educational evaluation. Although there are 
roughly as many measurement techniques as there are measure 
ment specialists (perhaps more, due to a rich legacy left 
some of the legendary specialists of the past), mastery of 
certain basic techniques will permit the assessment of most 
tryout outcomes. It is estimated that .95% of the data need; 
of activity tryouts and early monitoring can be satisfied 
by the application of one of the following four techniques. 

The Four Basic Techniques ' 

Paper and pencil instruments . Paper and pencil instru 
ments include standardized and non-standaridized tests, 
questionnaires, checklists, rating scales, etc. In general, 
they pose some fixed question and require a wrrtten response 
from all respondents. They have the advantages of ease of 
administration and scoring and can easily be adapted to 
measurement of both cognitive and affective (attitudinal) 
outcomes. 

Performance tests . Performance tests pose some fixed 
question or situation and require all respondents to do 
something. The consequent response is then observed and 
scored according to predetermined standards. Performance 
tests are especially adaptable to measurement of skill and 
competency outcomes which require more complex responses 
than those required on paper and pencil instrument-. 

Behavioral observations . Behavioral observations, 
unlike performance tests, do not involve questions or posed 
situations. Observations are usually performed by trained 
observers in a free-resppnse situation, where the person 
or persons being observed are not instructed in any syste- 
matic sense. Scoring is usually performed by the observer, 
who categorizes behaviors into discrete predetermined units 
and tabulates them as a function of time. During activity 



Peeper and Pencil 
Instruments 



Performnae Tests 



Behavioral 
Observations 



ER?C . 



30 

35 



tryouts, it may be extremely useful to schedule periodic 
observations of the participants as they perform to get 
clues about portions of the activity that are frustrating, 
boring, too time consuming, , etc. 

Interviews . Interviews require direct coimiuni cation 
between the respondent and the data collector (interviewer) 
The interviewer's questions may permit a range between wide 
response opt jons (unstructured or "open-ended questions) or 
limited response options (structured questions). Interviews 
are generally most valuable if administered after implemen- 
tation efforts to acquire information about possibly unan- 
ticipated outcomes. 



Intervieue 




ERIC 



31 

^6^ 



Considerations of adequacy . Paper and pencil tests 
are the most widely used measurement techniques, especially 
standardized"^ commercial ly-availk(1e ones. However, small- 
scale activity tryouts are more likely to require unstan- 
dardized, user-constructed paper and pencil tests, perfor- 
mance tests, observations, and interviews. Because only 
standardized tests usually contain bui'lt-ln attention to 
problems of measurement adequacy, .this makes It even more 
crucial for users at this stage to be 'aware of and to con- 
scientiously apply certain well-accepted considerations to 
avoid misinterpretation of measured results. The appli- 
cation of such oonside rations of adequacy to measurement 
techniques is nothing more than an attempt to keep you 
from fooling yourself and others. You should apply these 
considerations whenever you select or construct outcome 
measures. 

Objectivity . Will the technique yield the same score 
regardless of who is applying it? 

It should be recognized that all measurement technique 
h^ve some deficit in objectivity , In other words, there is 
a continuum running from relatively objective methods to 
relatively subjective methods. This same principle also 
applies to reliability, validity, efficiency, and reac- 
tivity, which will be discussed subsequently. Objective 
methods are those which yield- similar scores no matter who 
is doing the scoring. Subjective methods are those which 
can yield widely disparate scores when applied by different 
persons. 

In order to improve objectivity, it is necessary to 



Standardized here refers to the fact that the tests have 
been administered to c; large (usually representative) 
reference (or nor.n) gri.jp whose score distribution permits 
comparative interpretation of new scores; e.g. determining 
how Johnnie's te'jt score compares to the national norm. 



Objectivity 



32 



ERIC 



37 



\ 



establish scoring rules which enable unequivocal assignment 

of scores to each obtained response. For example, an Develop SaoHng 

"objective" paper and pencil test may require a very dis- ^^^^ of Keye 

Crete written response, such as "True/False" or "a, b, c, 

or d" which, with the aid of a test key, can then be scored 

"right" or "wrong" by virtually everybody who might be 

scoring that response. A less objective essay test Item 

requires a less discrete written response, and consequently 

there may be considerable disagreement among scorers .^s to 

the numerical value of that response. Objectivity can be 

Improved for this type of item by establishing a type of 

key which states the general scoring rules, then lists 

examples of typical responses and the proper scores for 

each. 

Another factor in Improving objectivity is to remove Remove Identifying 
identifying information from each response, so that a scorer ^^fo^^^'i-cn 
will not know who wrote it. This helps control such threats 
to objectivity as the "halo" effect. In the "halo" effect, 
the responses of a person known to the scorer tend to be 
scaled according to the scorer's general opinion of that 
person rather than on the merit of the response. Infor- i' 
mation on identity can be removed 1^ using code numbers 
rather than respondent name on.all scorable documents. 




d 



All things being considered, paper and pencil tests 
are usually more objective than performance tests, obser- 
vations, and interviews. During activity tryouts, however, 
you are generally often more interested in obtaining . 
intuitive or idiosyncratic judgments which are not amenable 
to highly objective scoring. Thus, even relatively sub- 
jective techniques may have an irrportant role at this stage 
if proper care is taken in interpreting their outcomes. 

Rel iabi i i ty . Does the technit^ut? produce data \hich art 
free from random error and thus yield a relatively "con- 
stant" score? 

Reliable measurement techniques are those which yield 
i'onstLvit^ scores which are relatively free from dhance vari- 
ation over time. For. example, if you measure an object 
twice with a wooden ruler, the results are Tikely to be 
very consistent, or reliable. On the other hand, if you 
used a. rubber band to perform such multiple rlieasurements , 
you may observe ^considerable inconsistency; the rubber b^and 
is a relatively unrel iable instrument for measuring length. 
Lack of objectivity can lead to low reliability. Trick 
questions or inaccurate recording devices (e.g. unwound 
clocks, inaccurate test keys) can also lead to lc« relia- 
bility. 

A moi e important consideration in improving reliability 
is the way measurement techniques are applied. An appli- 
cation which poses the same question or situation under 
identical conditions to all persons being -measured is 
generally more reliable. If possible, trien, instructions 
and conditions should be the same fo^ all examinees. 
Questions, or test items, should be phrased so that they 
discourage wild guessing. All examinees should be familiar 
with the type of response being requested. Always give a 
few practice items or a warmup period' so that performance 
will not be unduly influenced by unfamil iari ty with the 



Reliability 



Appl taction 

■ Considerations 



34 



39 



type of Item being used. Finally, combining several items 
usually produces a more reliable score than a single item. 
For example, if you are attempting to measure mastery of a 
student performance objective, a more reliable estimate of 
mastery can be obtained by requiring satisfactory perfor- 
mance on four out of five items rather than on one item. 

Validity. Does the technique actually measure what itlvalidity 
is supposed to be measuring, intuitively and by empirical 
demonstration, and thus yield a relatively "true" score? 

For present purposes, we are interested mainly in that 
aspect of validity which refers to the congruence between 
an underlying characteristic or objective and an obtained 
score. For example, if a score indicates a student has 
mastered an objective, has he in fact mastered it? In this 
connection, we may be interested in knowing two ba$1c facts. 
First, is the obtained score a measure of the desired 
characteristic and not some other extraneous or periph- 
erally-related characteristic? If We are interested in 
measuring, say, "knowledge of occupational opportunities," 
does our technique measure this and not reading ability,, 
test-taking ability, etc? Second* is the obtained score 
adequately representative of the entire characteristic 
and not some very limited aspect of the characteristic? 
In the example posed above, does our technique measure 
knowledge of a broad range of occupational opportunities or 
just a few limited ones? It is desirable to try to insure 
that measurement techniques are both closely related to and 
broadly representative of the characteristic being measured. 

The first step in doing this is to construct an 
explicit rationale for each measurement technique being 
utilized, to show in writing how the technique relates to 
the objective or characteristic being measured. For the 
above example, suppose one student performance objective 
for a caree»^ guidance program is for "participating students 



\ 



Conc-hmat Explicit 
Rationales 



35 



ERIC 



to be able to describe vocational and educational oppor- 
tunf'ties which are available to them." There are many 'ways 
to measure attainment of such an objective; i.n fact, paper 
and pencil tests, performance tests, behavioral observations 
and/or interviews might all be adapted to the task. Assume 
we--d€cide to adopt a paper and pencil item such as "List 
at least two educational opportunities open to you." The 
explicit rationale for this item might look something like 
the following: 

Objective is primarily cognitive, of basic knowledge 
type. Demonstration of attainment requires a 
oonGti'UQted response (as opposed to a selected re- 
sponse or mul tiple-Tfhoiae) , with no cues or prompts. 
Response must show some awareness of the pmatioal 
range of opportunities open to each individual re- 
spondent, e,g., a youth with a C- grade point average 
and no course work in math should not list a college 
pre-med course as an educational opportunity. This 
requires scoring by persons knowledgeable about each 
respondent. Additional scoring criteria must be 
designed to eliminate overlap (e.g., require "discrete" 
statements of o|fportahity) and require adequate specif- 
icity (e.g. "attend Faothil 1 ■ Col lege in the genera'l 
studies progranj," not "go to college"). 



L 




36 

■41 



The written rationale tends to (1) force attention to I 
important indicators of the objective (away from more triv-| 
ial indicators). (2) highlight practical administration and! 
scoring details which might otherwise be overlooked, and I 
(3) ensure that the required response is within the capa- | 
bility of the examinees. I 

The second step in helping to ensure validity is to \ Measure All Knom 

provide sufficient measures of eaah important objective or | ^^P^<^'^^ of ObjeaHve 

i^4.i' T L , i or Charaoteriatio 

cnaractenstic. In the above example, the designated paper] 

and pencil item, while probably an acceptable one, is insuf-J 

ficient in itself to adequately measure attainment of the I 

stated objective. At the very least, another item on vooa-l 

tional opportunities would be required. Scores on the I 

multiple items would then be summed to measure attainment I 

of the objective. I 
In general, measurement techniques which are relativelJ 

objective and reliable are more valid. I 

Efficiency. Is the technique relatively cheap and | Effioienay 

easy to administer {at least within the capabilities of I 

the person who is performing the measurement) and score? j 
Relatively efficient techniques are those which yield I 

reliable and valid scores at a low cost in terms of money I 
and examiner and examinee time. In general, this means | 
that instruments which can be administered to groups rather I 
than individuals, once rather than on multiple occasions, I 
and under noimxl rather than contrived circumstances, are I 
more ejf^oient. It also means that exercisf^s which can be I 
eaored and processed quickly (e.g., with a test key, a I 
behavior frequency tabulator, etc.) and easily (e.g., by a I 
clerk, a test-scoring machine, etc.) are more efficient I 
than those which require more ti?ne (e.g., rating an essay, I 
analyzing the content of a tape recording) and expertise I 
(e.g., by a committee of experts). Efficiency is especially 
important in measuring the effects of activity tryouts I 

37 

i2 



« 



because the results are usually needed for immediate appli- 
cation in modifying the activity before actual imple- 
mentation. 

Non-reactivity . Does the technique unduly influence 
the subsequent behavior of tho person to whom it is beinp 
applied? 

The classic example of a highly reactive technique is 
uprooting a seedling daily to measure its growth. Reac- 
tivity is to be avoided unless the reactive effect of the 
measure is designed to be part of the activity being scru- 
tinized, e.g., when a pre-test is designed to sensitize 
students to the material which they are- suppose'd to learn. 

Relatively non-reactive measures include physical 
tvaoQs of past events, routinely collected reaords or sta- 
tistics, and unobtrusive observations. An example of using 
physical traces as a measure is the case of a well-known 
research^firm which made daily records of the number of 
cigarette butts found in ashtrays to estimate the effects 
of a lung cancer prevention campaign. An example of using 
routine records as a measure is the use of school attendjince 
figures as a measure of the effectiveness' of an anti-truanc> 
program. Unstructured observations of such things as time 
to complete an acticyity and frequency of certain behaviors 
(like long disgust/d sighs accompanied by throwing pencil 
down on desk) can often tell more than any other measure 
about the effectiveness of an activity. Non-reactivity is 
promoted when the observer is placed so as not to intrude 
into the situation s/he is observing. 



Non-reaativity 



ERIC 




'A ( PV"!'' « 



i3 38 



The number of possibly useful non-reactive or unob- 
trusive measures Is limited only by your imagination. Since 
the validity of such measures often tends to be low, howevei 
multiple measures may become necessary, and special care 
must be taken In the construction of sound explicit 
rationales. 



ERIC 



39 



ACTIVITY 6 - "nt'ALLY UNDE'RUTANDING" MEASUKEMIM 

AiVo Mt:A::uHh:MENT techniques 



Please read the following passage from Fred Kerlinger's 1964 
book on the foundations of behavioral research 

"In its broadest sense, measurement is the assignment of 
numerals to objects or events according to rules." » This 
definition of measurement succinctly and accurately ex- 
presses the basic nature of measurement. To understand the definition, 
however, requires the definition and explanation of each important term 
— task to which much of this chapter will be devoted. 

Suppose that we ask a male judge to stand seven feet away from an 
attractive young woman. The judge is asked to look at the young woman 
and then to estimate tlie degree to which she possesses five attributes: nice- 
ncss, strength of character, personality, musical ability, and intelligence. 
The estimate is to be given numerically. In the number system a scale of 
numbers from 1 through 5 is used, 1 indicating a very small amount of 
the chararteristic in (lucstion and 6 indicating a great deal of the charac- 
teristic. In other words, the judge, just by looking at the young woman, is 
to assess how "nice" sJio is, how "strong" her character is, and so on, using 
the numbers 1, 2, 3, 4, aud 5 to indicate the amount of each characteristic 
she possesses. 

After the jud/je is finished, another male judge is asked to repeat the 
process with the same young woman. 'X'he numbers of the second judge 
are checked against those of the first judge. Then both judges similarly 
judge a ninnbcr of other young women. 

This example may seem to be a little ridiculous. Most of us, how- 
ever, go through very riiuch the same procedure all our lives. We often 
ji|dgc how "nice," how "strong," how "intelligent" people are simply by 
looking ai them and talking to them. It only seems silly when it is given 
as a serious example of measurement. Silly or serious, it u an example of 
measurement, since it satisfies the definition. The judges assigned nu- 
nierals to objects according to rules. The objects, the numerals, and the 

rules for the assignment of the numerals to the objects were all specified. 
The numerals were 1, 2, 3, 4, and 5; the objects were the young women; 
the rules for the assignment of the numerals to the objects Were contained 
in the instructions to the judges. Then the end-product of their work, the 
numerals, might be used to compute measures of relation, analyses of vari- 
ance, and the like. 

The definition of measurement includes no statement about the 
quality of the measurement procedure. It simply says that, somehow, nu- 
merals are assigned to objects or to events. The "somehow," naturally, is 
inaporunt — but not to the definition. Measurement is a game we play 
with objects and numerals. Games have rules. I; is of course important 
for other reasons that the rules be "good" rules, but whether the rules arc 
"good" or "bad," the procedure is still measurement. 



ERIC 



40 

45 



Why this emphasis on the dcfiniiion of nie;^surement and on Jti 
•*rule" quality? Ihere are three reasons. First, maasuiement, especially 
psychological and educational meafturement» is badly misunderstood* It 
is not hard to understand certain measurements used in the natural sci- 
ences—length, weight, and volume, for example. Even measures more re- 
moved from common sense can be understood with our vrenching elemen- 
tary intuitive notions too much. But to understand and accept the fact 
that the measurement of such characteristics of individuals and groups as 
intelligence, aggressiveness, cohesivcness, and anxiety involves basically 
and essentially the same thinking and general procedure is much harder 
to do. Indeed, many say that it cannot be done. Knov/ing and understand- 
ing that measurement is the assignment of numerals to objects or events 
by rule, then, helps to erase erroneous and misleading; conceptions of psy- 
chological and educational measurement. 

Second, the definition tells us that, if rulos cnn be set up on some 
rational or empirical bssis. measurement of anytln'ng is theoretically pos- 
sible. This greatly widens the scieniist s measiuemeni horizons. He will 
not, in j»».ort, reject the possibility of measuring sowq property because 
the property is. say, a complex and elusive one. He u»)(ierstands that meas- 
urement is a game that he may or may not be able in play with this or 
that property at this time. Rut he never rejects the possibility of playing 
the game, though he may realistically understand >\s difficulties. 

Third, the definition alerts us to the essential neutral core of meas- 
urement and measurement procedures and to the necessity for setting up 
"good" rules, rules whose virtue can be empirically tested. No measure- 
ment procedure is any better than its rules. The Iu^es given in the exam- 
ple above were poor. The procedure was a measurement procedure; 
the definition was satisfied. But it was a poor procedure for reasons that 
should become apparent later. 

This passage was quoted in its entirety because it repre- 
sents an extremely nice (if slighty chauvinist--remember it 
was done in 1964) job of presenting the definition of measurement 
and some of the problems associated with it. Anything is theo- 
retically measurable, but not necessarily with complete accuracy. 
Quality of measurement is pt^latior, and measurement techniques 
yield relatively better or poorer scores according to the way 
they are applied and the use to which the scores are put. 

In the text, we talked about four basic r./w;;? of measurement 
techniques. Lach type has many variations and each variation has 
certain rules of construction which, if foMowed, will make the 
scores ret:.cr measures, 



4 

Fred Kerlinger, rur?;.^Ar;r::.^ /v'V/.i!', //::■:/, ■./ v^/vl/i'^v/ (New York: 
Holt, Rinehart, and Winston, Inc., 1964), pD.4]N412. 



41 



In order to practice your skills In constructing evaluation 
measures, please read the following situation. 



You are the guidance counselor at an elementary school. You 
have developed an activity designed to help students who come' to 
the school nurse's office with minor injuries caused by in-school 
accidents. The goal of the activity is to help students to recog- 
nize things they can do to prevent such accidents. This is just 
one subcomponent of your overall guidance program goal, which is 
to help students to overtly identify sources of pain and frus- 
tration and formulate and practice realistic steps to eliminate or 
reduce them. The desired outcome of the new activity is that each 
student who reports to the school nurse's office with a minor 
injury will be able to describe, to the satisfaction of the nurse: 
(1) the immediate cause of the injury, (2) what s/he did prior to 
the injury. (3) how this act led directly to the injury, (4) what 
s/he might have done differently to avoid the injury, and (5) what 
s/he can do in the future to avoid similar injuries. The activity 
will involve giving all nurses a set of structured questions to 
ask each student who comes in with a minor injury. The questions 
will elicit responses consistent with each aspect of the objective 
stated above and will be repeated in slightly different forms until 
each student has voiced a suitable response. 




42 



Please write one prpcess objective of the activity. Write an item 
to measure its attainment? 

Please write one student performance objective of the activity. 
Write an item to measure its attainment using each of the following 
techniques, showing particular concern for objectivity, reliability, 
and val idity. 

1. A paper and pencil instrument 

2. A performance test 

3. An observation of student behavior 

4. An interview 

Are there likely to be some unanticipated outcomes? What might 
be one? Indicate how you would measure its attainment. 



43 



ERIC 



RESULTS OF PREUMINARY ACnVITY TRYOUTS 
AND EARLY IMPLEMENTATION 

If desired outcomes are achieved with a minimum of 
undesired side effects, improvements in efficiency and 
effectiveness can be expected during later implementation 
efforts. 

However, if desired outcomes are not achieved or are 
accompanied by undesired side effects, further analysis is 
indicated. Closer analysis of the proaeee and student 
product outcome data is required to determine if (1) outcom< 
deficits were probably caused by improper implementation of 
an activity, (2) outcome deficits were probably the result 
of an underlying fault in the basic design or rationale of 
an activity, or (3) outcome deficits were probably the 
result of a failure in the interactions between two or more 
activities. 

Improper implementation may often be spotted and cor- 
rected rather easily, especially when the process objective 
checklist shows crucial "No, it wasn't done" responses. 
Decisions to proceed with implementation in the face of 
negative results may even be made immediately if there is 
considerable confidence in the activity's logic and in the 
corrective action that has been taken. 

If everything was apparently done as planned, however, 
the problem is more serious. It may be necessary to select 
alternate strategies or activities to achieve the desired 
outcomes. Further tryouts may be necessary prior to large 
scale implementation. 

It is almost always preferable to discover implemen- 
tation problems and/or theoretical deficits in an activity's 
logic prior to large scale implementation. All too fre- 
quently, well-conceived ar?d costly guidance programs ini- 
tially fail to achieve their goals because of the failure 
of one important component. By then student, parent, and 
community expectations may have been seriously deflated and 
even effective remedial action may arrive too late to save 
the program. 

44 



Possible Defects 



"It 's worth it," 



' SUMMARY 



These readings have briefly discussed the valuable practice 
of conducting preliminary try outs and nwnitoring. early implemen- 
tation efforts. A quantitative rating scheme was suggested to 
help you determine which activities are most in need of pilot 
testing prior to implementation, the ratings consider aspects of 
activity repl icabi 1 ity , revisabil i ty , importance, and uncertainty 
of effect. Prior to carrying out empirical pilot tests, the in- 
te.^nal logic o^ the activity should be verified. Actual pilot 
testing involves implementing the planned activity, under well - 
controlled circumstances, using representative members of the 
target population, collecting information on degree of process 
and student performance objective attainment, side effects, time 
for completion, and costs. Monitoring involves collecting similar 
data during early implementation efforts for combined activities. 
Outcome measurement is facilitated if certain standard techniques 
are utilized with consideration for well-accepted criteria for 
adequacy. Inferences about the effects of .implementation are 
facilitated by the use of a simple evaluation design. Deficits 
in desired effects, if discovered at this stage, may be corrected 
in time to optimize the career guidance program of which the 
activity is a part. 



45 



DISCUSSION QUESTIONS 

The following questions are designed to let you check on the knowledge 
you have gained and discuss with the Coordinator and those in your group 
any remaining questions or problems. Discussion may range to other ques- 
tions and issues related to the module also, if you desire. 

1. What is formative evaluation? 

2. What are two types of career guidance activities for which tryouts 
or pilot tests are not useful? 

3. What are two types of career guidance activities for which tryouts 
are moat useful? 

4. What steps should be taken in reviewing the internal logic of a 
planned activity? 

5. What is the general strategy of pilot testing? 

6. Why are sample groups for activity tryouts rarely chosen by random 
selection? 

7. What is "evaluation design"? 

8. What are four considerations of measurement adequacy? 

9. What is a way of improving a measure on each of the considerations 
of adequacy you discussed in question 8? 

10. What are the major expense categories to consider? 



46 



J 



POSTASSESSMENT 



^* tryouts"^^"" describe the primary purpose of carrying out activity 

2. Please list two situations in which pilot testing of activitiies i 
useful and two in which it is a waste of time. ' - 

Useful; 1. 

> 

Waste of Time: 1. " . 

2. 

3. One of the process objectives for a career guidance program is the 
following. 



4. 



5. 



7. 



ERIC 



By June 30, all the books and materials required for the 
special study carrel will be either ordered or in hand. 



Please construct a checklist item for measuring attainment of this 
process objective. 

The box below contains a brief description of the trial implementation 
of a career guidarrce activity. Please develop a paper and pencil test 
Item, a performance test item, a behavioral observation, and an inter- 

!I^n^^^^^-"l^ measure the intended outcomes of this- activity. These 
can be brief and to the point. ' ^ 



In order to estimate the probable effectiveness of the 
proposed handbook about vocational training programs. Ms. Longpause, 
the head of the Fliver High School Guidance Department, decided 
upon a small scale tryout of a prototype handbook. She personally 
m?nimnm'?? ' ^wo-page description of the goals , ^current enrollment, 
minimum entry requirements, length, and 1971* graduate placement - 

tul\^^ i Electronic Computer Programning. 

She asked the first ten students who came into her office for " 
appointments the next Monday morning to individually read the 
description. She then asked them to take the description home 
and di scuss it with their parents. 

Check your responses to question 4. How could the paper and pencil 
Item you wrote be made relatively objeativel 

Would the four items you listed in question 4 above constitute a 
relatively valid measure of the attainment of the objectives of 
beMproiedr^ training handbook? Why or why not? How could they 

Later that week, Ms. Longpause decided to check on unanticipated 
?f ."n^^ by calling the parents of each of the ten students to see 
If any had discussed the Abacus program at home. How could she 
secure more reliable information from these calls? 

47 

52 



APPLICATION 



You are now ready to map out a plan for applying the skills you 
developed in this module to your own setting. If you are not working 
in a particular setting and would like a hypothetical setting with 
which to think through this Application, turn to the "Optional Group 
Simulation Description" located -in the Appendix. Thinking of the 
sections of the module and the questions you have discussed, consider 
the tasks that must be accomplished to try out activities and monitor 
early implementation efforts for your program, the person who should 
be primarily responsible for each task, and the date by which the 
task should be done. Since these tasks must be performed after the 
process objectives have been established, tie dates you choose must 
coordinate with those set for earlier program planning tasks. If 
possible, this exercise should be done under the direction of someone 
from your district who would be good at taking charge of this effort 
and who would like to do it. Use the chart on the following pages for 
listing your tasks, the people responsible, and the completion dates. 
The major headings are already noted on the chart. Use additional 
paper as needed. 



53 



APPLICATION 



TASKS 



INDIVIDUAL(S) 
RESPONSIBLE 



COMPLETION 
DATE . . 



I. Determine when to pilot test. 



II. Devise, a general strategy of 
pilot testing. 



S4 



49 



APPLICATION 



TASKS 



INDIVIDUAL(S) 
RESPONSIBLE 



COMPLETION 
DATE 



III. Estimate costs. 



IV. Determine the best measurement 
technique. 



r 



APPENDIX 



OPTIONAL GROUP SIMULATION DF.nCRlPTION 



The description below is intended as a frame of reference for those 
of you who are not working in a particular setting. You may wish to 
describe one of the other schools that might be in the "Lillington School 
District," to change parts of this description, or to use it for ideas in 
creating your own setting. Feel free to modify it to meet your particular 
needs. 

The Lillington School District is a suburban district made up of seven 
elementary schools, four junior high schools, and two high schools, with 
a total school population of 10,000 students. The district is rapidly 
changing from one that is largely rural to one that is urbanized. It has 
several active industries, including a furniture factory and an automobile 
assembly plant. The minority population is growing, and the unemployment 
rate is higher than that in the rest of the state. 

Lillington's Chester Arthur High School has an enrollment of about 
1,600 students in grades 9-12. Approximately 75% of the students are from 
lower and lower-middle class families. The racial/ethnic composition is 
approximately 82% white, 11% black, 5% Spanish-surname, and 2% Oriental and 
Indian. The school has experienced some group conflicts, but none serious 
to date. 

No follow-up studies have been conducted in recent years, but the school 
counselors estimate that about 40% of their grauuates continue their educa- 
tion, mainly at the nearby community college. The dropout rate there and in 
Arthur High is unknown. The counselors guess that at least half of the 
girls marry within a year after graduating from Arthur High; most of them do 
not seek jobs until after they have had children. Most of the remainder of 
the female graduates and a large majority of the males seek jobs locally, 
principally in the furniture and auto assembly plants. No information is 
available as to their success or failure in obtaining jobs. 

Chester Arthur High School is headed by a fairly pragmatic principal 
and a progressive, forward-looking vice-principal. The teaching staff is 
a mixture of older, conservative individuals who want to avoid most "new 
fangled ideas" and young, liberal instructors who are eager for cnange. 
Each group has its well -liked leaders. 



51 



57 



The guidance staff consists of the director of guidance, who is a 
frustrated Freudian psychoanalyst, and two counselors: a middle-aged 
ex-space engineer who loves paper work and a young vegetarian who likes 
to "rap with kids." Two part-time aides assist with filing and other 
clerical tasks. The department has never Tormulated its goals and objec- 
tives, and its principal guideline is "How have we handled that in the past?" 
The department seems to ^e drifting, each member "doing his own thing" 
without cooperation or coordination. 

The school's efforts in career education are haphazard. Some teachers, 
it is thought, include in their classes some career information about the 
subject area. Counselors supply specific information to students who 
request it. The school library includes a file of occupational information, 
but it is rarely used and poorly maintained. 

Chester Arthur High, as well as other Li 1 1 ington District schools, has 
experienced a number of problems recently. There are persistent rumors 
of widespread drug sales and usage on campus. Racial tensions are increasing 
Students complain a great deal about the curriculum and the school in general 
School spirit is at an all-time low. 

Parents and community members are becoming increasingly dissatisfied 
with the school system. They feel that taxes for education are far too 
high and that students are not adequately prepared for the world they face 
upon leaving school. A number of parents believe that faculty members are 
too liberal and that they should devote more time and effort to "the basics." 
Other parents insist that the school is old fashioned and that much of the 
subject matter is irrelevant and unimportant. Parents from both of these 
groups say that their children do not know what they want to do and are 
uninformed about career opportunities and job requirements. 

Arthur High graduates, complain local employers, make poor employees. 
Many employers seem to agree with a statement made by a foreman at the 
furniture plant: "All these kids care about is the paycheck, but they 
don't know how to earn it." 

Due to financial problems, there has been talk recently of having to 
reduce the school staff. One of the leading candidates for reduction is the 
guidance staff. While the staff members seem to be generally liked by both 
students and teachers, the school administrators are having trouble 
justifying the expenditure of money for so tenuous a purpose. They are 
seeking documentation and other evidence to justify retaining the 
guidance program and staff. 




J 



REFERENCES 



There are several references which are "must", reading for those 
who would like to go deeper into the area of formative evaluation in 
general and activity tryout in particular. These are as follows. 

Baker, Eva L. "Formative Evaluation of Instruction." In Evaluation 
in Eduoation: Current Applications, edited by W. James Popham, 
pp. 531-585. Berkeley: McCutchan Publishing Corp., 1974. 

This is an excellent chapter, most comprehensive yet communi- 
cative. Chapter subheadings include What is Formative Evaluation 
For?, Formative Evaluation of Instructional Prototypes, External 
Data Gathering in Prototype Tryouts, Prototype Data Sources: Summary, 
Issues in Prototype Testing, Operational Testing, Optional Exercises', 
and References. 

Sorenson, Garth. "Evaluation for the Improvement of Instructional 
Programs: SynkJ Practical Steps." Evaluation Comment 2, 4» 
(January 1971), pp. 13-17. A publication of the Center for the 
Study of Evaluation, UCLA. 

As the title implies, this article presents practical steps in 
the formative evaluation of educational programs. It includes eight 
"principles," examples, references, and a "Formative Evaluation 
Checklist" for use by developers of programs which have definable 
goals. 



Briggs, Leslie J. Handbook of Procedures for the Design of Instruction. 
Pittsbirgh: American Institutes for Research, 1970, pp. 173-177. 

The referenced pages constitute Chapter 8 of this document, 
entitled "Formative Design, Formative Evaluation, and Summative 
Evaluation." Briggs presents an outstanding case study example of 
the use of formative design and tryout procedures to bring about 
dramatic increases in the performance of learners on a multimedia 
first aid course. He also presents eight valuable suggestions for 
conducting formative evaluations and six self-test items. 

Van Dal en, D.B. an'" Meyer, William J. Understanding Educational 
Research, Second Edition. New York: McGraw-Hill, 1966, pp. 
301-325. 

These 25 pages are the best we have found for presenting simple 
straightforward instruction on the construction of paper and pencil 
instruments, including questionnaires, tests, inventories, rating 
scales, checklists, sociometrics , etc.; performance tests; behav- 
ioral observations, including check lists and schedules, time sampling, 
etc.; and interviews, including individual and group, structered 
and unstructured. 



53 



59 



Much more detailed coverage of the same material mav be found 
In chapters 26-32 in 

Kerlinger, Fred N. Foundationa of Behavioral neaearah. New York: 
Holt, Rinehart. and Winston, Inc., 1964, pp. 467-602. 

Additional References 

Flanagan, John C. Measuring Human I'evformanae . Palo Alto: American 
Institutes for Research, 1974. 

This text is especially valuable if you are interested in attitude 
or value measurement. It suggests that too much dependence in this 
area is placed on self-report ratings and scales and not enough on the 
observation of behavior-a valid indicator (or indirect measure) of 
attitude or "value." See especially pp. 222-227 on uses of human 
performance measures in guidance. 

Greenberg, B.G, "Evaluation of Social Programs." Review of the 

Intemattonal Statistical Institute 36, 3, (1968), pp. 260-277. 

Isaac, Stephen, and Michael, William B. Handbook in Researoh and 
Evaluation. San Diego: Robert R. Knapp, 1971. 

A very confusing compendium of various and sundry techniques and 
principles, unless you know what you're looking for. Pages 82-91 
contain concise explanations of validity and reliability, along with 
techniques for generating quantitative estimates of these concepts; 
those could be very useful to you if anybody ever says "What was the 
veUability of your measure?!" with a sly smirk. Pages 62-63 contain 
a nice discussion of measurement reactivity. Pages 92-93 contain 
useful information on designing and carrying out a mailed questionnaire 
survey. 

Popham, W. James. An Evaluation Guidebook: A set of Pmatioal Guidelines 
for the Educational Evaluator. Los Angeles: The Instructional 
Objectives. Exchange, 1972. 

Scriven, Michael. "The Methodology of Evaluation." In Perspectives of 
Curr-ioulwn Evaluation. Chicago: Rand McNally, 1967. 

This document is the first in a series of American Educational 
Research Association Monographs on curriculum evaluation. 

Webb ^ Eugene J., al. Unobtrusive Measures: Nonroaotive Heseccrch 
in the Social Gcienc^s. Chicago: Rand McNally, 1966. 

The bible of non-reactive measurement techniques. Also a very 
witty, knowledgeable, and well-written book. 

Weiss, Carol H. "Utilization of Evaluation: Toward Comparative Study." 
In Readings in Evaluation Research^ edited by Francis G. Caro 
New York: Russell Sage Foundation, 1971, pp. 136-142. 



54 



0 



COORDINATOR'S GUIDE 



MODULE 10 



TRYING OUT ACTIVITIES AND MONITORING 
EARLY IMPLEMENTATION EFFORTS 



UniVfJ^lt I the American Institutes for Research, under support by the 
United States Office of Education, DeparuTient of Health, Education, and 
Welfare under Part C of the Vocational Education Act of 1963. 

August 1975 

Reprinted with minor revisions, July 1978 Qj^ 



TABLE OF CONTENTS 



' Page 

Coordinator's Role and Functions 1 

Introductory Activity r - 3 

Outline of Introductory Remarks - 5 

Activity Feedback 6 

Assessment Criteria 9 

Introduction to Application - 12 

Sample Evaluation Instruments 13 



ERIC 



(^2 



COORDIMTOR 'S ROLE, AND FUNCTIONS 



Your role as coordinator is crucial. It may be thought of in four 
categories. 

Set the Tone 

Set the right mood. Don't make things deadly and boring. Inject humor 
into the activities and discussions, let people joke around some and have 
fun. On the other hand, make it clear that there is a very serious purpose 
behind it all. People should be relaxed, but alert, interested, and moti- 
vated. 

Set the Pace 

' t 

Maintain the right pace. If thijigs bog down, inject some humor, ask 

some provocative questions, get a lively discussion going. Some sections 
can be summarized orally to speed things, and this can be planned ahead. If 
things are going too fast and people are getting lost, slow it down, let them 
ask questions, spend time orally covering the points. Keep the flow smooth 
at junctures in the module--winding up one activity with a satisfying resolu- 
tion and easing participants into the next. Take breaks as you sense they 
are needed. Be flexible in structuring activities, adapting to individuals 
and situations as needed. Regard times listed in the "Module Outline" as 
flexible. j 

Faci 1 i tate 

Encourage discussion and interaction from the participants. Bring out 
the shy people; don't let the aggressive ones dominate. Seek out questions 
and uneasinesses, get them into the open, talk them over, especially at the 
beginning. Watch facial expressions and body language. Be a trouble shooter. 
Spot problems and work them out. In short, act as a guide through the 
mbdule, but try not to get in the way. 

Evaluate 

Make sure participants are headed in the right direction; nudge them 
that way when they're not. Judge whether they perform adequately in the 
postassessment items, the activities which are part of the assessment 
(see the "Assessment Criteria"), and the Application. Keep a record of 
how each participant does. In general, maintain the quality level of the 
workshop. 



Specific K unctions of CQQrdinatnr 
Prior to workshop: 

Study the module thoroughly ahead of time. Be familiar with all par- 
ticipant materials and this Coordinator's Guide. 
At the workship: 

1. Introduce yourself to participants, and them to each other. Briefly 
explain your background and the role you will play in the module. 

2. Establish time limits (lunch, when day ends) and schedule for the 
day, and do your best to stick to them. 

3. Conduct Introductory Activity if you plan to use it (see "Introduc- 
tory Activity" here'in). 

4. Introduce the basic purposes and structure of the module (see "Out- 
line of Introductory Remarks"). Answer any questions. 

5. Start participants on the text. Lead discussion and practice activi- 
ties as you go through this module. Provide feedback on the practice 
activities (see"^ctivities Feedback" and "Assessment Criteria"). 
Collect evaluation data on the participants. 

6. Conduct the Postassessment. Collect the results. Evaluate each of 
the participants on her/his performance (see "Assessment Criteria"). 
Keep a written account of , this. 

Start participants on the Application (see "Introduction to/ Appli- 

cation"). Evaluate the plan produced (see "Assessment Criteria"). 

Conduct a Wrap-up session. Your tasks here are to 

a. Summarize what has gone on and been accomplished. 
D. Resolve any unanswered questions. 

pnJp^prHnn"r?%f°'' ^^A^^'^""^^ ^tudy. Go through the Refer- 
ence section briefly; add any sources you know of. 

m«Il.!i^°"/"^ technical assistance available-experts related to 
module topics to whom participants might be able to turn. 

Throughout, observe how things go; collect suggestions for ways to 

improve the module. Keep a written account of these. " 

Submit the results of 5, 6, 7, and 9 to the overall workshop director. 



7. 
8. 



9. 

10. 



d. 



ERIC 



2 

64 



OUTLINE OF INTRODUCTORY REMARKS 



ERIC 



Learners generally do better if they are presented an overview of 
what it is they are to learn. This is your main job at the beginning of 
the module. Having reviewed the materials, briefly summarize them for 
the participants, preparing them for what is to come. Go over the Module 
Goal and Outcomes to be sure that the knowledge and skills to be gained are 
clear. Go over the Module Outline so that participants will understand how 
their time is to be structured. Review the Model and how Module 10 fits in. 
Encourage and answer any questions you can. 

No outside materials are required for this module. However. Sorenson's 
article would be useful to have available for participants. Also, if at all 
possible, have copies of the Van Dalen text available. (The^se are described 
in the References.) If by any chance you have access to the Baker and Briggs 
texts al-so, so much the better. 



65 



V 



ACTIVITY, AND DISCUSSION FEEDBACK 

One of your most important functions as Coordinator is to provide 
feedback to participants as they work through the readings, discussions, 
and activities. Try to make sure they are understanding the central points 
and are able to do the things requested in the activities. To help you in 
this role, some feedback suggestions are provided below. 

D iscussion on "Subset Optimization 

Try to help participants see "that pilot tests normally 'involve small 
"diagnostic" tryouts of individual program components. 

Di scussion on a "General Strategy" of Pilot Testing 

Try to help participants derive a strategy such as the following: 
The general strategy of pilot testing is to Implement 
ncfrvn ^Hk^ activity Under wel 1 -control led circumstances, 
us ng members ofthe target population, collecting (a) 
?^:^®^^^;!® I"^o^';ation on degree of attainment of process 
■ and student performance objectives, and (b) any other 
objective or subjective information which 1s likely to 
mL^^? "lu^" Indicating changes that can potentially be 
made in the activity. Care should be taken to. avoid 
over-co lection of data so that immediate diagnostic 
feedback can be obtained. 

Activity 1 - Analyzing a Program's Internal Logic 

This task requires a good deal of common-sense analysis, such as verifying 
that process C will in all 1 ikel 1 hood .lead to attainment of performance 
objective B, which can reasonably be expected to contribute to attainment 
of goal A. Specifically, the written analysis should contain evidence of 
consideration of the following points: 

a. Are Jhe activities described specific enough to be observed? 
Can^thelr attainment be dooumentedl 

b. Is there reason to believe that most members of the target 
audience can aEWi/ perrom the student performance objectives, 
making the planned activities unnecessary? wjc^uiy^, 

^' actf5u?es?^ Indicated time aahedule or Sequence for multiple 



ERIC 



66 



e. 



f. 



Are the planned activities and sequence practical with regard 
to the constraints l.lkely to exist In this situation? Are 
they appropriate to the level of the target audience? 

Are there likely to be Hde effeata which the program developers 
have probably not anticipated? 

Do the planned activities relate to the intended student perform- 
ance objectives? Is there a reasonable chance that the activities, 
if carried out as indicated, will produce the desired outcomes? 
uo the proposed student performance objectives relate to the 
program's stated goal or goals? - 



Activity 2 - Estimating Costs » 

The total expenses for the first year would come out something like 
this. Figures may vary within a reasonable range. 
COST CATEGORY 

Personnel 



NO. OF UNITS 



RATE/ UN IT 



ESTIMATED COST 



ParaprofessioRal 


40 weeks 


$150/ week 


$6,000 


Principal 


10 days 


$100/day 


$1,000 


Consul tant 


10 days 


— $125/ day 


$1,250 


Teachers 


400 hours 


^ $ 7/hour 


$2,800 


Counselors 


200 hours 


^ $ 7/hour 


$1,400 


Secretary 


1 month 


$750/month 


$ 750 






TOTAL 


$13,200 



r 



Equipment/Supplies 
Film Projector 
Filmstrip Projector 
Screen 
Tape Decks 
Games and Materials 
Books 

General Supplies 



1 
1 
1 
2 

several 



Other Program Expenses . 
Mail ing Letters ' 2,000 



Reproducing: 
Letters 
Feedback form 
Report 



2,000 pp. 
2,000 pp, 
2,000 pp. 



$600 
$200 
.$ 50 
$100 
$200 



TOTAL 

FIRST YEAR 



$.10 

$.03 
$.03 
$.03 



TOTAL 



$ 600 

$ 200 

$ 50 

$ 200 

$ 200 

$1,000 

$ 200 
$2,450 

$ 245 

•$ 200 

$ 60 
$ 60 
$ 60 

$ 380 



67 



CO§T CATEGORY NO. OF UNITS RATE/UNIT ESTIMATED C OST 
B^uilbin^Costs . • • 

1 classroom $I»p00 t i.qqq 

c- J ,j . . TOTAL 'I 1.000 
Fixed Admim stratlve Charges 

$14,825 budget 5% $ 750 
^ total of above 
• four categories) 

TOTAL $ 750 

SUBTOTAlo= $15,575 

With this information, the decision becomes much easier. . It's a 
matter of whether the benefit derived was worth $15,575. Since there 

were approximately 1,000 users the first year, another way of stating 
the question is. whether it was worth $15 per student to have the career 
center. 



Activity.! 

The feedback for this activity is in , the form of some possible answers 
to the questions asked. These are not the only answer^ and are intended 
to be only suggestive. of the responses that participants might generate. 
Please discuss all responses in your group and bring out differing perceptions. 

Process objectives would state something to the effect that the school 
nurse is to ask the following questions of all students who appear requesting 
treatment for minor school -related injuries. 

What happened to cuse this? What were you doing before that? Why 
do you think what you did caused your accident? Could you have avoided 
the accident? How? 

Measurement of process objective attainment would involve observa- 
ot the nurse and markina d "Yti's, it was done in this case," "No, it wasn't 
done in this case" checklist. 

Student performance objectives would state something to the effect 
that students would be able to respond voc^aUy to the above questions in 
an appropriate manner. Criteria of response adequacy might require cer- 
tain standards of probable accuracy (to^avoid lying), common sense (tar 
avoid unlikely causes), and realism 4,to avoid unsound ayoidance 'strategies ) . 



^6^ 



Although the objective of the activity clearly calls for a "perfor- 
mance" test as the ultimate criterion, many other measurement techniques 
could be used to estimate side effects. The range of possible alterna- 
tives is almost unlimited, so it is very difficult to suggest an answer 
a priori to this question. For each of the f yar techniques, however, 
some evidence should be provided to show consideration of the following 
techniques for improving objectivity, reliability, and validity. 

Objectivity. Develop scoring kpys or rules. Remove 
identifying information from responses. 

' Reliability. Increase objectivity. Equalize testing 
conditions for all examinees. Discourage random 
responding, e.g., wild guessing. Use familiar and 
uncomplicated response forms. If possible, combine 
measures to produce a composite score. 

Validity. Improve reliability. Construct explicit 
rationales for each technique. Measure all aspects of 
an objective or characteristic. 

Some of the possible unanticipated outcomes include the following: 
(1) students are injured less frequently at school, (2) students ore injured 
with the same frequency but come to the ni-rse's office with less frequency • 
to avoid a "lecture," (3) administering first aid to students becomes 
so time-consuming there is always a queue outside the nurse's office, 
(4) teachers send students to the nurse's office for more minor injuries 
than before because students now seem to come back so "pensive," (5) the 
nurse becomes a non-directive counselor and aces you ouc of your job. 

Measurement of such outcomes involves tabulating records, inter- 
viewing students and teachers, observing behavior, and checking your 
pay envelope every month for a pink slip. 

End of Mouule Discussion 

Again, allow for variations and individual emphases. The point fere 
IS to review some of the major topics of the module and be sure everyone' 
has understood and internalized the infonnation. Don't feel restricted to 
this set of questior.3 if others can be added that seem appropriate. 

1. Formative evaluation is a process of collecting and w^ina in- 
formation to improve the functioning of educational programs 

2. Pilot tests are not useful for activities which are fa) not 
replicable and (b) not revisable. 



ERIC 



69 



crici.l %hr."'°'S?'^^"^ activities which are (V) most 
• fp^Ihi.f the overall success of the guidance programs and 
(2J about which there is most uncertainty in terms of de- 
sired outcome attainment. 

4. Eight steps in . reviewing internal logic of an activity. 

a. Do the activities relate to the desired outcome? ' 

b. Uo the activity's outcomes relate to the program..^ goals? 

c. Can the target population already perform the activity's 

objectives? uv. ujr a 

d. Can outcomes be documented? 

e. Is a time sequence indicated? ■ 

f. Are the planned activities and sequence practical? 

g. Are the planned activities and sequence appropriate to 
the level of the target audience? ' 

h. Are there likely to be unanticipated side effects? 

i. The general strategy of pilot testing is to implement the 
m nbers of'ihi'f ^^\l-^0"t^ol ^'d ci rcums?lnces u ng 

Tnf.l ^^^^ l^""^^^ population, collecting (a) objective 
information on degree of attainment of process and student 
performance objectives, and (b) any other objLtive or Lb- 
^rhf Information which is likel^ to be use?Sl in ?nd catinq 
changes that can potentially be made in the act v tv C^re ^ 
should be taken to avoid over-col lection o? data o that 
inxnediate diagnostic feedback can be obtained 



6. Small samples from a population are rarelv representative of 
the extremes of that population. ' ^^'^^^^^^^^ ve or 

7. An evaluation design is an arrangement of persons, activities 
and n^asures of effects such that inferences can be made 
aoout the probable effects of the activities on a larger 
group of simili^r persons. a icirger 

8. Considerations of measurement adequacy. 

OyectivUy. The degree to which a technique will pro- 
duce the same score regardless of who is applying it. 

SmI^'-h-V^^^^^ ^^^''^^ ^° ^^^"^^ ^ technique wiT pro- 

duce data which are free from random error and thus yield 
a relatively "constant" score. ' ^ 

ff^'-^^nnlol t^T^ ^° ""^'^^ ' technique measures what 
It is supposed to be treasuring, thus yielding a "true" 
score . 3 -V, 



a. 
b. 



c. 



d. 



8 

ERIC 



9. Improvements in each category above. 

a. Objectivity. Develop scoring keys or rules. Remove 
identifying information from responses. 

b. Reliability. Increase objectivity. Equalize testing 
conditions for all examinees. Discourage random respond- 
ing, e.g., wild guessing. Use familiar and uncomplicat- 
ed response forms. If possible, combine measures to 
produce a composite score. 

c. Validity. Improve reliability. Construct explicit ration- 
ales for each technique. Measure all aspects of .an objective 
or characteristic. 

d. Efficiency. Use group measures. Administer sparingly. 
Avoid contrived circumstances. Use scoring aids. 

Or Reactivity. Use physical traces, archival records, 
and unobtrusive observations. 

10. a. Personnel costs 

b. Equipment and materials 

c. Other direct expenses 

d. Building costs 

e. Fixed administrative costs 



9 




ASSESSMENT CRITERIA 



Outcome 1. Measurement: Postassessment items 1 and 2. 
Criteria; 

Item 1. The sentence should essentially state the following: 
"To bring aboui small improvements in impqrtant activities 
which are meant to be replicable and about which there is 
some uryfertainty." 

Item 2. Useful instances: 

1. When the activity is replicable. 

2. When the activity is revisable. 

3. When the activity is important. 

4. When there is considerable doubt about the effects 
of the activity. 

Any two of these four constitute an acceptable response. 
Waste of time: The reverse of any two of the above four. 

Outcome 2. Measurement: Activity 1. 

Criteria: As specified in the activity feedback, the written 
analysis must contain evidence of consideration of the following 
points: 

a. /'.re the activities described specific enough to be 
observed? Can their attainment be doomentedl 

b. Is there reason to believe that most members of the 
target audience can already perform the student performance 
objectives, making the planned activities unnecessary? . 

c. Is there any indicated time schedule or sequence for 
multiple activities? 

d. Are the planned activities and sequence practical with 
regard to the constraints likely to exist in this situation? 
Are they appropriate to the level of the target audience? 

e. Are there likely to be aide efSaats which the program 
developers have probably not anticipated? 

f. Do the planned activities relate to the intended student 
performance objectives? Is there a reasonable chance 
that the activities, if carried out as indicated, will 
produce the desired outcomes? Do the proposed student 
performance objectives relate to the program's stated 
goal or goals? 

Outcome 3. Measurement: Activity 2. 

Critpria: The activity feedback section provides an extensive 
example of approximately what the cost estimates should look 
like. Make sure the participant has completed the worksheet 
and estimates and that the estimate approximates that provided 
in the feedback. 



ERIC 



10 

72 



Outcome 4. 



Measurement: Activity 3. 

Postassessment item 3. 
Criteria: The skill that is important to judge here (in 
both the activity and postassessment) is. -the ability to 
write an item which measures the attainment of a process 
Objective. In both cases, the item should be a restate- 
ment of the objective, with "Yes. it was done." "No. it 
wasn't" options. 



Outcome 5. Measurement: 



Outcome 6. 



ERIC 



A'-tivity 3 

Tv .^assessment items 4-7 
The itdms developed in Activity 3 and on item 4 
of the postassessment should be some variation 
themes: 



Criteria: 



on the following 



Paper and pencil test item. "Li«;t all tha minimum entry 
requirements for Abacus School of Electronic Computer 
Programming." 



b. 



d. 



Item 6 - 



Performance test item. "What, are the minimum entry 
requirements for Abacus Tech?" 

Observation. How long does it take each student to rp^d 
the handbook? Do students take the handbook with them 
or wad it up and toss it in the trash can? 

Interview. "What did you think of it?" 

Postassessment items 5 - 7 should respond as follows: 

Item 5 - Any in^iication that (a) scoring rules (a test key) 
should be constructed, and (b) the scorers should 
not be made aware of the identity of the respondents. 

Any negative response associated with the notion 
that they test knowledge and attitudes outcomes 
related to only one of seven vocational training 
programs in Fliverville. The validity could be 
improved by writing items which measure outcomes 
related to the other six vocational training 
programs. 

Item 7 - Ask everybody the same question, call at approxi- 
mately the same time of day (or evening), chat 
with everybody for a few minutes before asking 
questions, ask several questions trying to get 
at the same point, etc. 

Measurement: Application plan produced 
Criteria: The time and task analysis produced should line 
out the tasks under each of the major headings listed, assign 
responsibility for each task to someone, and set a completion 
date for each task. In addition, the plan should be reasonably 
1) Logtaal - do the tasks flow m logical sequence? 
^ Thorough - is it detailed enough to be helpful? 
Jj FeastbLe - is it not too detailed to be burdensome; 
are the times allowed for the tasks reasonable? 
fatr - are the responsibilities assigned equitably 
and fairly? 



4) 



11 



^3 



SAMPLE SVALUATIOH IHSTRUMEIITS 



On the next five pages are two sample instruments, the Module Perform,n» 
'""^ Ev^lu^tion Questionnaire for st»ff n .yeloDn^nt UnrWHnnc v.... 
may w,sh to use these instruments to gather infomation for evaluating any 
workshop in which you administer this .odule. and for .aking decisions about 
future workshops. The Module Performance (mpr) is a for. for tallying 

participants' achievement of objectives. The Evaluation Ouestinnn,^.. seeks 
participants' opinions on four dimensions: (1) perceived value of the 
workshop; (2) effects of participating in the workshop; (3) role and 
performance of the coordinator; and (4) recormended improvements in the 
workshop. As it now stands, the questionnaire should take participants 
10-20 minutes to complete. -You, as module coordinator, should complete 
the MPR form based upon the results of the postassessment or other 
evidence supplied by participants. If ycu duplicate the Evaluation 
Questionnai-e for participants to complete, we suggest you print it as 
a four page booklet. 



'^4 

13 



NATIONAL CONSORTIUM ON COMPETENCY-BASED STAFF DEVELOPMENT 

MODULE PERFORMANCE RECORD 



MODULE TITLE: 
WORKSHOP DATES 



WORKSHOP COORDINATOR(S) 



Participants' Names 
(Alphabetically) 



OBJECTIVES 

(Place a check ( 'mark for each 
objective achieved.) 





1 


2 


\ 3 


A 
*+ 


c 


0 


7 


1 ^ 
















12. 
















I ^ 
















14. 
















1 5. 
















1 ^• 
















1 ^• 














u 


1 3- 
















9. 
















: 10. 
















n. 


1 














12. 
















13. 
















14. 












1 




15. 

















Uevgloped at the American Institutes for Research, under support by the 
'■ERid Q^^Tce of Education, Department of Health, Education, and 



14 



7.^ 



Welfare. 



Kevised fiav 



NATIONAL CON SORTIUM ON COMPETENCY-BASED STAFF DgVFI DPMFMT 
EVALUATION QUESTIONNAIRE FOR STAFF DEVELOPMENT WORKSHOPS 



Your responses to the brief questions in this booklet will helo 
us evaluate the workshop you just completed and make decisions 
regarding future workshops. Please take 10-20 minutes to answer 
honestly and thoughtfully. You need not sign your name but we 
do need your help. Please answer each question. " Thank you 



Name (Optional ) 
Module Title 



Date 



A- General Issues Related to the Workshop 



iTnlTJc' ^0^^'"" B. C, D. or E) of the statement which best 

expresses your feeling or opinion on each itii in the following list If none 
ote possible choices precisely represents your view, pick the e thit 



STATEMENTS 





15 



B. 



Please list and briefly describe up to three major positive changes that 
you have experienced in your knowledge, attitudes, or skills because of 
this workshop.. Ojntinue on the back of this booklet if necessary. If you 
did not experie^ite any positive changes, please check the appropriate space. 

There were jio positive changes. 



C, 



Please list and briefly describe any negative effects you have experienced 
because or this workshop. Continue on the back of this booklet if necessary 
If you did not experience any negative effects, please check the appropriate 

S PaC6 • 



There were no negative effects. 




0. 



Please list and oriefly describe any improvements you anticipate in your 
career guidance program as a result of this workshop. Continue on the back 
of this booklet ir necessary. If you don't expect any improvements as a 
result of this workshop, please check the appropriate space. 

I don't expect any improvements in my career guidance program 

as a result of this workshop. 



ERIC 



16 



I 

E. Please list and briefly describe any other comments on this workshop, 

^lilllflJT^ or suggestions you have' for improving it. We are especially 

1! ! ! '^^ activities that should receive more or 

less emphasis.. Continue on the back of this page if necessary 



71 



ErJc ■ ' 17 



